Commas Recovery with Syntactic Features in French and in Czech

نویسندگان

  • Christophe Cerisara
  • Pavel Král
  • Claire Gardent
چکیده

Automatic speech transcripts can be made more readable and useful for further processing by enriching them with punctuation marks and other meta-linguistic information. We study in this work how to improve automatic recovery of one of the most difficult punctuation marks, commas, in French and in Czech. We show that commas detection performances are largely improved in both languages by integrating into our baseline Conditional Random Field model syntactic features derived from dependency structures. We further study the relative impact of language-independent vs. specific features, and show that a combination of both of them gives the largest improvement. Robustness of these features to speech recognition errors is finally discussed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Induction of Fine-Grained Part-of-Speech Taggers via Classifier Combination and Crosslingual Projection

This paper presents an original approach to part-of-speech tagging of fine-grained features (such as case, aspect, and adjective person/number) in languages such as English where these properties are generally not morphologically marked. The goals of such rich lexical tagging in English are to provide additional features for word alignment models in bilingual corpora (for statistical machine tr...

متن کامل

Comparison of the Speech Syntactic Features between Hearing-Impaired and Normal Hearing Children

Introduction: The present study seeks to describe and analyze the syntactic features of children with severely hearing loss who had access to the hearing aids compared with children with normal hearing, assigning them to the same separate gender classes.   Materials and Methods: In the present study, eight children with severe hearing impairment who used a hearing aid and eight hearing children...

متن کامل

Labeling the Semantic Roles of Commas

Commas and the surrounding sentence structure often express relations that are essential to understanding the meaning of the sentence. This paper proposes a set of relations commas participate in, expanding on previous work in this area, and develops a new dataset annotated with this set of labels. We identify features that are important to achieve a good performance on comma labeling and then ...

متن کامل

Modeling Comma Placement in Chinese Text for Better Readability using Linguistic Features and Gaze Information

Comma placements in Chinese text are relatively arbitrary although there are some syntactic guidelines for them. In this research, we attempt to improve the readability of text by optimizing comma placements through integration of linguistic features of text and gaze features of readers. We design a comma predictor for general Chinese text based on conditional random field models with linguisti...

متن کامل

Verbs in Applied Linguistics Research Article Introductions: Semantic and syntactic analysis

This study aims to investigate the semantic and syntactic features of verbs used in the introduction section of Applied Linguistics research articles published in Iranian and international journals. A corpus of 20 research article introductions (10 from each journal) was used. The corpus was analysed for the syntactic features (tense, aspect and voice) and semantic meaning of verbs. The finding...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011